Historic,  Archive  Document 

Do  not  assume  content  reflects  current 
scientific  knowledge,  policies,  or  practices. 


[ 


A 


UNITED  STATES  DEPARTMENT  OF  AGRICULTURE 
Agricultural  Marketing  Service 


INTERPRETATION  OF  VARIABILITY  IN  THE  DATA  USED 
IN  PREPARING  THE  COTTON  GRADE  AND  STAPLE  REPORTS 

( Revised ) 


'repared  by 
F.  H.  Harper 
In  July  1934,  while  a  member  of  the  Staff 

of  the 

Bureau  of  Agricultural  Economics 


Washington,  D.  C. 
October  1939 


Interpretation  of  Variability  in  the  Sample  Data  Used  in 
Preparing  the  Cotton  Grade  and  Staple  Reports 

by 

F«  Ho  Harper 


During  the  past  decade  a  particularly  keen  interest  has  de- 
veloped in  the  possibilities  of  improving  sampling  methods  and  in 
evaluating  the  adequacy'"  and  reliability  of  sample  data  selected  to 
represent  stratified  universes o     Numerous  contributions  have  been 
made  in  this  branch  of  statistical  technique,  and  there  has  been 
marked  advancement  in  the  appraisal  and  application  of  different 
sampling  procedures.     The  preparation  of  this  paper  purports  to 
indicate  some  of  the  advantages  of  refinement  in  the  analysis  of 
variability  in  paired  data^,  to  demonstrate  application  that  has  been 
made  of  results  obtained  from,  analyses  of  variability  in  the  alloca- 
tion of  samples  used  as  a  basis  of  reports  on  the  grade  and  staple 
length  of  the  American  cotton  crop,   emd  to  indicate  how  the  vari- 
ability in  the  sfmple  data  used  in  preparing  the  reports  has  been 
evaluated  and  interpreted. 

The  analysis  of  problems  involving  quality  of  cotton  pi?oduced 
in  the  United  States  is  facilitated  by  the  use  of  reports  on  the 
grades  and  staple  lengths  of  cotton  ginned.     The  usefulness  of  these 
reports,  issued  by  the  United  States  Department  of  Agriculture,  and 


now  available  for  the  entire  domestic  crops  of  1928  to  1933,  inclusive, 
will  depend  somewhat  upon  the  extent  to  which  agricultural  workers  and 
others  using  them  acquaint  themselves  with  the  data  and  the  procedure 
followed  in  assembling  them. 

In  1927,  Congress  enacted  legislation  (Act  of  March  3,  1927, 
Public  No.  740  -  69th  Congress— 44  Stat.  1372-1374)  authorizing  and 
directing  the  Secretary  of  Agriculture  to  collect  and  publish  annually 
"statistics  or  estimiates  concerning  the  grades  and  staple  lengths  of 
stocks  of  cotton  on  hand  on  the  1st  of  August  of  each  year  in  ware- 
houses and  other  establishments  of  3very  character  in  the  continental 
United  States,"  and  also  to  publish  "not  less  than  three  such  estimates 
with  respect  to  each  crop.     It  is  further  provided  that  "In  any  such 
statistics  or  estimates  published,  the  cotton  which  on  the  date  for 
which  such  statistics  are  published  may  be  recognized  as  tenderable 
on  contracts  of  sale  of  cotton  for  future  delivery  under  the  United 
States  Cotton  Futures  Act  of  August  11,   1916,   as  amended,   shall  be 
stated  separately  from  that  which  may  be  untenderable  under  said  Act 
as  amended." 

The  Division  of  Cotton  Marketing,  Bureau  of  Agricultural  Economi 
inaugurated  in  1928,   in  compliance  with  the  provisions  of  the  Act,  the 
preparation  of  reports  on  the  grade,   staple  length,  and  tenderability 
of  the  entire  domestic  crop,  funds  for  the  purpose  becoming  available 
for  the  fiscal  year  beginning  July  1,  1928.  Preliminary  work  of 

_l/    The  first  report  on  the  1928  crop  was  issued  on  September  2  8, 
1928,  and  had  reference  to  cotton  ginned  in  the  United  States  prior  to 
September  1.     The  first  report  prepared  in  compliance  with  the  Act  on 
the  "carry-over"  of  American  cotton  was  issued  on  Septem^ber  21,  1928, 
and  referred  to  stocks  of  cotton  on  hand  in  the  United  States  at  the 
beginning  of  the  cotton  marketing  year  on  x\ugust  1. 


( 


4 

I 


•t 


preparing  reports  on  the  grade,   staple  length,  and  tenderability  of  gin- 
nings  had  already  been  undertaken  in  1927  by  the  Division  of  Cotton 
Marketing,  Bureau  of  Agricultural  Economics,  under  the  general  authority 
contained  in  the  Agricultural  Appropriation  Act  of  January  18,   1927,  and 
with  funds  made  available  for  the  fiscal  year  beginning  July  1,  1927*  ^ 

Partly  because  the  available  funds  were  inadequate  for  a  survey 
of  the  entire  Cotton  Belt,  and  partly  because  the  work  was  not  in- 
augurated early  enough  in  1927  to  permit  the  establishment  of  an 
organization  sufficient  for  that  purpose,  the  inquiry  pertaining  to  the 
1927  crop,  or  the  1927-28  season,  was  of  restricted  scope,  covering 
only  the  State  of  Georgia,  20  counties  in  northern  Texas,  ^  and  7 
counties  in  the  contiguous  part  of  southwestern  Oklahoma.  1/ 

Prior  to  the  inauguration  in  1927  of  this  preliminary  v;©rk  by  the 
Division  of  Cotton  Marketing,  some  earlier  attempts  had  been  made  to 
furnish  information  on  staple  lonfrth  of  cotton  produced  and  ginned  in 
the  United  States.  5/    The  Bureau  of  Crop  Estimates  ^  of  the  United 

2/    The  first  report  of  the  1927-28  season  was  issued  on  October  4, 
1927,  and  had  reference  to  cotton  ginned  in  the  State  of  Georgia  prior 
to  September  1.     Funds  for  carrying  on  the  work  were  provided  from^  those 
originally  alloted  to  the  Division  of  Crop  and  Livestock  Estimates. 

^3/    Including  Baylor,  Childress,  Collingsvjorth ,  Cottle,  Dickens, 
Foard,  Hall,  Hardem.an,  King,  Knox,  Motley,  Viichita,  Wilbarger,  Crosby, 
Donley,  Floyd,  Hale,  Eockley,  Lemb,  and  Lubbock  counties. 

4/    Including  Comanche,   Cotton,  Greer,  Harmon,  Jackson,  Kiowa,  and 
Tillman  counties. 

5/  Results  of  early  measurements  of  cotton  fiber  lengths  were  pub- 
lished by  the  Census  Office  of  the  Departm.ent  of  the  Interior  in  Volume 
5  of  the  1880  Census,  pages  14  to  35,  inclusive. 

6/    The  Division  of  Statistics,  established  May  28,  1863,  became 
the  Bureau  of  Statistics  in  1903,   and  this  Bureau  became  the  Bureau  of 
Crop  Estimates  on  July  1,   1914.     The  Bureau  of  Crop  Estimates  was 
combined  with  the  Bureau  of  Markets  an  July  1,   1921,  to  form  the  Bureau 
of  Markets  and  Crop  Estimates.     On  July  1,  1922,  the  -Office  of  Farm 
Management  and  Ffrm.  Economics  was  com.bined  with  the  latter  to  form  the 
Bureau  of  Agricultural  Economxics,  at  which  time  the  Division  of  Crop  nnd 
Livestock  Estimates  was  established. 


f- 


1 


-  4  - 


Str.tes  Depc.rtnent  of  A;::riculture,   or::finizcd  in  191-1,  prepr.red  estinatcs 
on  ths  staple  length  of  cott'"!n  ginned  from  the  1914  crop.  2j  The 
inquiry  concerning;  the  lenrth  of  staple  specified  Ion,?  staple  as  being 
I-3/I6  inches  and  upivard  in  length.  8/         preparin.s:  estimates  c^^ncernt 
ing  the  staple  length  of  the  1915  crop,   long  staple  cotton  Y/as  con- 
sidered to  be  that  having  a  length  of  l-l/S  inches  and  more.  The 
preparation  of  staple-length  estimiates  ^  v^ras  discontinued  in  1926 
by  the  Division  of  Crop  and  Livestock  Estimates,  vdiich  f r :"^]'-  the  timio 
of  its  establishm.ent  in  1922  had  continued  the  staple-length  estimiates 
inaur:'urated  by  the  Bureau  of  Crop  Estim.ates.     The  last  cf  these  prepared 
reports  for  the  United  States  had  reference  to  the  1925  crop,  10/ 

1/    Farmers'  Bulletin  651,  February  6,   1915,  p-.  12-13,   end  Cotton 
Production  and  Distribution,   Bureau  of  the  Census,   Bulletin  1831, 
page  24. 

8/    Monthly  Crop  Rep'^-rt,   June,   1916,   page  50,   and  Farmers*  Bulletin 
651,  February  6,   1915,  pare  12. 

Q  / 

:lj     I nform.ati on  relative  to  these  estimates  is  readily  accessible 
as  follov/s:     1914  crop,   FarmLcrs*  Bulletin  661;   1915  cr"p,  T'^onthly  Cr-^p 
Report,   June,   1916,  page  51,  and  I-.'onthly  Crop  Report,   June,  1917, 
pp.  52-53J  1916  crop.  Monthly  Crop  Report,   jLine,   1917,  pp.  52-53,  and 
United  States  Departm.ent  •?f  Agriculture  Bulletin  733,  pages  7  and  8; 
1917  crop.  United  States  Department  of  Agriculture  Bulletin  733, 
pages  7  and  8;   1918  crop,  ''^"onthly  Crop  Reporter,   June,   1920,  pafre  52; 
1919  cr-^p.  Monthly  Crr].  Reporter,   June,   1920,  page  52,  and  --cnthly 
Cr'^p  Reporter,  /.rril,  1921,   page  45;  1920  crop,  ■"■'"onthly  Crop  Reporter, 
April,   1921,   page  45.     Published  records,   if  any,   of  estim.ates  pertain- 
ing to  the  crops  r-f  1921  to  1925,   inclusive,   -^re  not  availo.ble. 

iO/    Reports  were  prepared  b^/  L.  L.  Janes  on  the  1926  and  1927 
crops  -"'f  Louisiana,  but  they  were  not  published.     North  Carolina  and 
/.rkansas  arc  two  states  for  which  the  furnishin-.'  of  inf '-■rriati on  on  the 
quality  of  the  cotton  crop  was  attem.pted  prior  to  the  inauguration  of 
reports  by  the  Division  of  Cotton  Marketing.     (See  United  States 
Departmiont  of  Agriculture  Bulletin  476  for  inform.ation  relative  to 
classification  of  sam.ples  fr-^m:  the  North  Car'^linc  crops  of  1914  and 
1915,  and  Arkansas  Extension  Circular  92  for  results  of  classification 
of  samples  fr^^r.  the  Arkansas  crops  of  1915,   1916,   1917,  and  1918.) 


-  5  - 


In  order  for  the  Division  of  Cotton  Mrrkoting-  to  cfirry  out  the 
provisions  of  the  Act  of  Mcrch  3,   1927,   it  seemed  appropriate  to  procure 
samples  of  cotton  from,  different  localities  in  each  of  the  cotton- 
producing  states*     Prior  to  the  inc.u?"urrition  of  this  work  there  was  no 
comprehensive  information  indicating  differences  in  grade  and  staple 
length  of  cotton  produced  and  ginned  in  these  states,   and  for  this  reason 
the  procuring  of  sam.ples  could  not  at  first  be  planned  on  the  basis  of 
••Btrati float iori     ••  according  to  known  differences  in  the  cotton.  The 
importance  of  proper  stratification  soon  becam.e  apparent  to  the  field 
representatives,  hov/'ever,  T,vhen  it  wa  s  r  e  a  lized  that  the  number  of  samples 
on  which  to  base  the  grade,   staple- length,  "-^-^  tenderability  reports 
would  necessarily  be  limited  by  funds  and  facilities  available  for  com- 
plying with  the  statute,  ll/ 

In  order  to  procure  samples  on  which  to  base  the  grade,  staple- 
length,  and  tenderability  reports,  the  Division  of  Cotton  Marketing  makes 
arrangem.ents  each  year  with  certain  .einners  in  different  parts  of  the 
cotton-producing  states  v/hereby  they  agree  to  furnish  a  sample  of  about 
four  ounces  from  each  bale  ginned  at  their  ;^ins,  the  sample  to  be  drawn 
from  the  gin  press  box  or  frou  the  bale  after  it  has  been  pressed  and 

11/    The  total  number  of  samples  of  upland  cotton  classed  and  used 
in  preparing  reports  on  specified  crops  is  as  follows:     1928,  1,359,254; 

1929,  1,007,^952,'  1930,   1,139,763;  1931,   1,032,401;  1932,  641,579; 
1933,  739,544. 

The  total  num.ber  of  sojp.ples  of  American-Egyptian  cotton  classed 
and  used  in  preparing  reports  is  as  follows:     1928,   3,980;   1929,  4,334; 

1930,  4,  638;  1931,   2,' 459,   1932,   1,894;  1933,  3,773. 


Ill 


-  6  - 


tied.  To  date,  practically  all  of  the  sanples  classed  for  purposes  of 
quality  statistics  have  been  press-box  samples,  but  a  few  ginners  have 
cut  them  from  the  bales. 

When  arrangements  were  made  with  ginners  for  the  procurement  of 
saiaples  during  the  1928-29  season,  no  comprehensive  information  vias 
available  that  would  reliably  indicate  variations  in  the  grade  and 
staple-length  of  cotton  produced  from  one  state  to  another  or  in 
different  localities  within  the  indi^^idual  states.     In  viuw  of  this 
fact,  gins  v/ere  selected  thrt  season  to  represent  states  v«^ithout 
reference  to  these  variations  and  without  any  particular  reference  to 
geographic  heterogeneity,    kn.  attempt  was  made,  however,  to  insure  the 
procurement  of  an  aggregate  of  sejaples  in  each  state  sufficiently  large 
to  provide  a  safe  margin  of  adequacy  and  reliability  V\jith  respect  to 
representative  variability,  it  being  realized  at  the  time  that  the 
aggregate  of  samples  likely  to  be  procured  might  be  greater  in  some 
instances  than  absolutely  necessary  to  insure  reliable  reports  on  the 
grade  and  staple-length  of  total  gimiings. 

/vfter  the  data  pertaining  to  the  classification  of  samples  had 
become  available  for  the  entire  ginning  season  of  1928-29,  as  a  pre- 
liminary step  to  assembling  them  in  final  form,  the  producing  states 
vrere  stratified  into  districts,  based  in  a  general  way  on  prevailing 
soil  types.     This  division  into  districts  (See  U.  S,  Dept.  /vgri. 
Sta.tistica.1  Bulletin  40),  boundaries  of  which  v'ere  adjusted  to  conform 
to  county  lines,,  not  only  permitted 


-  7  - 


the  assembling  of  tiie  grade  and  staple  data  on  the  basis  of  convenient 
subdivisions  and  the  coraparison  of  vari';bility  anong  then,  but  it 
obviated  the  possible  necessity  of  determining  a  series  of  weights  in 
order  to  properly  emphasize  the  relative  inport'^nce  of  each  grade  and 
staple  length  ginned  in  different  parts  of  producing  states,  especially 
in  those  inst'^.^nces  in  which  the  aggreg^ites  of  srjiiples  procured  in 
different  parts  of  producing  states  nay  not  have  been  uniforialy 
proportionr.te  to  total  ginnings  therein* 

Tha  data  pertainiag  to  the  grade  and  £t;;.ple  length  of  ginnings 
of  the  1928-£i9  season  showed  considerable  variability,  as  was  expected, 
from  one  state  to  another  and  even  vjithin  the  individual  str.tes.  It 
was  logically  inferred,  therefore,  in  view  of  the  size  of  sample 
procured,  thr.t  there  vrurc;  greater  degrees  of  heterogeneity  in  some 
states  and  localities  than  in  others.    Because  of  this  varying 
heterogeneity,  the  indications  (to  field  representatives\ obviously 
wave  that  the  proportionate  distribution  of  the  -aggregate  of  srjnples 
in  rel'  tion  to  ginnings  should  be  different  from  one  state  to  another, 
from,  one  division  of  a  st^te  to  another,  and,  in  m'.ny  instances,  even 
from  one  part  of  a  division  to  another.     This,  of  course,  is  obvious. 
It  is  mentioned,  hoivever,  in  order  to  place  the  proper  emphasis  upon 
the  fundamental  problem  involved  in  the  sampling  of  grade  and  staple 
universes  made  up  of  a  large  number  of  heterogeneous  stratifications. 


4 


I 


-  8  - 

Distinction  has  been  carefully  made  in  sampling  cotton 
ginnings  between  "uniform  variability"  and  "representative 
variability,"  and  the  analyses  upon  which  this  report  is  based 
are  predicated  on  the  assumption  that,  at  least  so  far  as  the 
mathematics  is  concerned,  the  statistical  heterogeneity  is  an 
indicator  of  needed  representation  and  sample  requirement.  It 
is  realized  that  there  is  no  rigid  formula  or  equation  by  which 
the  number  of  samples  or  gins  required  to  insure  representativeness 
in  the  proportions  of  the  different  grades  and  staple  lengths  can 
be  accurately  determined.     Pi'ocedure  has  been  devised  and  adapted, 
however,  by  V7hich  a  logical  allocation  of  an  aggregate  of  samples 
is  made  in  accordance  with  the  extent  to  v;hich  stratifications  and 
larger  universes  are  homogeneous.     Statistical  technique,  involving 
the  analysis  of  variability  in  the  sample  data  from  one  season  to 
another,  has  been  found  useful  in  appraising  the  adequacy  '^nd 
reliability  of  sample  data  procured  as  representations  of  stratified 
grade  and  staple-length  universes. 

With  final  assembling  of  the  data  pertaining  to  the  1929  crop 


-  9  - 

at  the  end  of  the  1929-30  ginnins:  season,   there  were  available  the 
figures  representing  number  of  samples  of  each  grade  and  staple  length 
that  had  been  received  from  cooperating  ginners  during  two  consecutive 
seasons©     It  became  possible,  therefore,  to  arrange  p8.ired  series  of 
observations  for  those  gins  from,  which  samples  had  been  received  both 
seasons  and  to  analyze  the  variability  both  between  and  within  these 
series • 

For  purposes  of  obtaining  numerical  measurements  of  variability 

in  the  sam.plo  data,  and  in  order  to  provide  for  an  evaluation  of  the 

relative  importance  of  variability  in  different  producing  areas,  data 

pertainin,'"^'  to  the  staple  leng;th  of  cotton  tinned  at  cooperating  gins 

have  been  analyzed  in  their  entirety  and  by  arranging  them  in  paired 

groups.     That  part  of  the  procedure  pertaining  to  the  derivation  of 

squared  variability  (increments  of  the  standard  deviation  squared) 

is  so  flexible  that  it  permits  the  separation  of  variability  estimated 

as  having  been  contributed  from  different  detected  sources.  12/  ^he 

value  of  statistical    method  in  studying  variability  in  staple-length 

distribution  of  cotton  samples  received  from  the  cooperating  gins  is 

occasioned  by  the  existence  and  effects  of  numerous  influencing 

12/    See  "The  Analysis  of  Variance  Method  of  Measuring  Differences 
betv^oen  Staple-Length  Designations  of  Press-  Box  and  Cut  Samples  of 
Cotton,"  October,   1933,  by  F.  H.  Harper  and        B.  Lanham.  (Mimeo- 
graphed.)    See  also  other  reports  mentioned  in  footnote  25. 


if 


-  10  - 


heterofjenecus  factors.  13_/  i^ncnr  these  is  georcraphic  location  of 
cotton-producing  areas,  some  of  which  are  characterized  by  a  wide 
diversity  of  soil,   cultural  practices,   and  v/eather  conditions. 

The  adoption  of  a  stratified  sample,  beginning  vv'ith  the  1929-30 
season,  by  dividing  the  states  into  districts  was  intended  to  increase 
the  homogeneity  of  individual  universes  and  thus  facilitate  the  proper 
proportio.nate  sampling  thereof.     1^    This  constituted  the  first  attem.pt 
made  since  the  inaururation  of  the  proicct  to  -vain  mastery  in  the 
sampling  procedure  over  the  relation  betv/een  geo^rraphic  location  and 
grade  and  staple  length  of  cotton  produced  e.nd  tinned. 

The  need  for  stratification  within  districts  of  producing  states 
has  also  been  recognized,  and  it  has  led  to  an  increasin.-^lv  careful  studv 
each  year  of  grade  and  staple-len.Tth  variability  and  of  geographic 
heterogeneity.     This  study  has  appropriately/  been  divided  int^  tv^ro  pha-s©8# 
One  of  these  consists  of  analyses  of  differences  in  cotton  varieties 


i^/    There  is  anal^.^tical  technique  admiirably  adaptable  als-'^-  to  the 
measurement  of  differences  in  paired  and  replicate  classifications 
of  identical  sam.ples  of  cotton  and  of  different  sam.ples  from,  the 
so.me  bale  vj-hich  riakes  possible  an  evaluation  of  the  relative  imiportance 
of  variability  contributed  from^  different  detected  sources.     It  is 
adaptable  also  in  many  instances  to  the  analysis  and  interpretation  of 
price  differences. 

1  ^  / 

_/     Laniiam.,   '-jU  B.,   and  Harper,  F.  K«  Jour,  of  Farm  Economics, 
Volum.e  XVI,  Mumdoor  2,  April,   1954,  pages  329-333.     For  the  1928-29 
season,  the  Cotton  Belt  Viras  stratified  acoordin-"  to  state  boundaries o 
For  the  1929-30  season,  hov/ever,  the  individual  states  v/ere  stratified 
into  districts,  the  boundaries  of  which  were  adjusted  to  county  lines, 
the  primary  purpose  being  to  provide  universes  that  virould  possibly 
be  less  heterogeneous  than  entire  states.     (See  Ti,  S.  Dept.  Agri. 
Statistical  Bulletin  40.) 


-  11  - 


planted,   soil  heterogeneity,  ginning"  machinery,   and  other  factors, 
the  possible  effects  of  which  on  grade  and  staple-length  variability 
do  not  readily  lend  themselves  to  precise  mathematical  measurement. 
Consideration  of  these  factors  and  their  possible  effects  has  been 
found  useful  in  allocating  the  aggregate  of  samples  which  it  has  been 
possible  to  procure  and  class  with  the  funds  and  facilities  available 
for  these  purposes. 

The  other  phase  of  the  study  consists  of  analyzing  the  statis- 
tical variability  in  proportionate  staple-length  distribution  of  ginning 
from  year  to  year  v^rithin  individual  states  and  districts.  15/  The 
results  of  these  analyses  furnish  indications  of  the  difficulty  that 
might  be  expected  in  the  sampling  of  universes  by  providing  for  the  ap- 
praisal of  the  degree  of  statistical  heterogeneity  and  homiogeneity  of 
staple-length  distributions.     They  serve  also  another  useful  purpose  in 
the  samplinr  procedure  by  providin,-"  for  an  cstim.ate  of  the  num.ber  of 
homogeneous,   or  approxim.ately  homogeneous,   statistical  stratifications. 

In  makin;:'  the  final  analyses  of  variability^  in  staple-length 
distribution  and  the  determination  of  homogeneous,   or  approximately 
hom.ogenoous,   statistical  stratifications,  jaired     data  have  been  used 
throughout,  but  data  pertaining  to  all  oooperatin-.  ginning  establishment 
have  also  been  analysed  in  appraising  the  importance  of  variability  in 
relation  to  the  sampling  procedure.     To  avoid  the  use  of  large  numbers 


No  similar  analyses  have  been  made  of  grade  variability. 


» 


-  12  - 

r.nd,  7;hich  is  more  ir.pcrtant,  to  correct  for  differences  in  volune  of 
ginnin-T,s  at  identical  gins  during  successive  seasons,  most  of  the  analyses 
have  been  based  on  the  percentages  representing  the  proportions  of  each 
staple  length  ginned  rather  than  on  the  actual  number  of  bales.     It  then 
beccnes  possible  to  make  a  lop:ical  ccniparison  and  analysis  of  differences 
in  distribution  of  ginnings  at  the  same  gins. 

The  follov/ing  equations  indicate  some  of  the  basic  considerations 
underlying  the  analyses  of  scmple  data  used  in  preparing  the  grade  and 
staple-length  reports. 

1.  ir.(d^)  *  (c'yf  =I(c;^)^    "rli^yf  t  2  I.(d^)  (dy) 

2.  SRe^)  4  (dyf  =  l(dj2  f  I(dy)2  -f  2  lUJ  (dy) 

n  n  n  n 

2    Kd^)  (dy)  1  2  J{i^)  (d^)  t  2l(djJ  (d„) 

4.     ^[(dx)  t   (dy)  t   {i'Sf  ^  Idyf   .    £(du)^  , 

n  ~n^n"^n"^ 

2  K^x)  ('y)  2l(^x)  (^-u)      ,      2l.(dy)  (dj 

n  r.  '  n 

If  observations  in  tvro  series  are  from  the  sam.e  universe,  then, 
on  the  average,   the  standard  deviation  squared  of  one  is  expected  to 
equal  the  standard  deviation  squared  of  the  other,  and  the  product  of 
the  txvo  standard  deviations  v/ill  be  equivalent,   on  the  average,  to  the 
square  of  the  individual  standard  deviations.     It  is  apparent,  of 
course,  that  the  squares  of  standard  deviations  calculated  for  series 


-  13  - 

representing  different  periods  of  time  need  not  necess£^rily  be  identical 
in  magnitude  in  order  for  the  indications  to  be  that  the  different  gins 
or  series  of  gins  are  sampling  the  same,  or  very  similar,  stratifica- 
tions, especially  if  seasonal  chEinges  affect  the  variability. 

2  2 

lAfhencver  r  =  0,  o-     ^  ^  ^  =  2  cr   ,  as  the  following  illustration 
will  indicate.     The  data  in  columns  2  and  3  of  table  1  are  used  in  making 
this  illustrative  presentation. 


-  14 


Table  1.-     Illustrati  in  which  the  varir.nce  of  the  coluiinn 

of  suiranations  equals  two  standard  deviations  squared 


1 

;  2 

5  3 

;  4 

\  5 

Duplicate  observation 
number 

!  ^ 

!  y 

5     ^  y 

1 

!  40 

I        10  1 

50  ; 

400 

2 

5  40 

:        20  1 

!          60  1 

;  800 

3  ! 

; 

!        40  1 

I        30  1 

70  1 

!  1200 

t                 U-  w  V/ 

4  ! 

;        50  ! 

10        :          60  ! 

500 

5 

!           50  ! 

20  : 

70  1 

;  1000 

6  J 

50  ! 

30  J 

80  ! 

1500 

Total  ! 

270 

!      120  J 

390  J 

5400 

Mean  j 

45.0  ! 

20.0  s 

65.0  J 

900 

Mean  of  squares  ! 

2050.00  ! 

!      466.67  : 

4316.67  ! 

Square  of  mean  : 

2025.00  ! 

400.00  J 

4225.00  I 

Standard  deviation  ! 

squared  i 

1        25.00  ! 

66.67 

I          91.67  . 

<T  X  ^  5.00 
cry  8.1651 

Product-moment  -  «  m  -r.t  n 

n  Vy  =  " 

r  =  0,  and  cr"^  ^  t  y  *  ^  ^  +  <^^y) 

The  coefficient  of  correlation  is  the  quotient  obtained  by  divid- 


ing the  correlated  variability         by  the  geometric  mean         of  the 

16/    Correlated  variability  m.ay  be  conveniently  referred  to  as 
"covariance. " 

17/    The  geometric  mean  is  the  nth  root  of  a  product.    ITien  there 

are  two  numbers,  the  geometric  mean  is  the  square  root  of  their  product; 
when  there  are  tViree  numbers,  the  geometric  m.ean  is  the  cube  root  of  the 
product;  etc. 


If 


15  - 


val'iances.    In  actual  calculation,  the  product-moment  becom.es  the  corre- 
lated item,  and  the  product  of  the  two  standard  deviations  constitutes 
the  equivalent  of  the  geometric  mean  of  the  variances.    An  important 
consideration  in  the  derivation  and  interpretation  of  a  coefficient  of 
correlation  is  that  there  should  be  a  sufficiently  large  nmber  of 
observations  in  the  series  to  overcomic  the  tendency  toward  unity.  It 
is  to  be  remembered  clso  thct  errors  of  observation  do  not  cancel  out 
in  obtaining  "r",   so  thst  the  calculated  coefficient  will  not  be  the 
same  as  it  would  be  if  these  errors  were  not  present. 

Separation  of  the  correlated  and  uncorr elated  parts  of  vari- 
ability is  readily  accomplished  after  the  squares  of  the  standard  de- 
viations have  been  obtained.     The  followinp;,  equations,   in  which  the 
symbols  "c"  and  "u"  are  adopted  to  designate  the  correlated  and  un- 
oorrelr>ted  pc.rts  of  variability,  respectively,  will  illustrate  the 
procedure  as  applied  to  trble  1.     In  presenting  the  results,  fractional 
parts  of  variability  are  referred  to  as  standard  devirtion  squared 
instead  of  increments  of  standard  deviation  sauared.     Analvsts  mp.y  use 
the  latter  term  if  it  is  felt  thtt  the  results  referred  to  as  standard 
deviation  squared  do  not  convey  sufficient  implication  in  this  respect. 

4  (T^^  i  2  0-^    =  91.67  (or  cr  ^  ) 
c  *  u  ^  X  t  y^ 

2  cT  2^  t  cr2^  z  45.835  (or  CT  2^  ^  ^divided  by  2) 
o-^Q  +  0-^^  I  45.835  (or  one-half  of  cr       +  ^y) 


16  - 


0-2     =  0  (or  45  .835  -  45.835) 


0-2^  =  45.835   (or  45.835  -  0) 


These  calculctions  are  the  equivalent  of  the  f ollovvang j 

(1)  cr  2^  =  .  I  =  45.835  -  45.835  z  0 

^^u  -  ^    ^  ^  ^   ^-  -  (5-2^  =  45,835  -  0  =  45.835. 

The  value  91.67  includes,  as  the  equations  indicate  and  as  the 

calculations  have  illustrated,  four  parts  of  cr  2      lAdiich  is  a  zero 

^  c 

quantity  in  this  instance,  end  tv/o  parts  of  cr  2^,       One-half  of  this 
value,  therefore,   or  45.835,   includes  tv;o  Darts  of  o-  2     and  one  of 
0-2^,     Tj-LQ  average  of  the  sum  of  25.00  and  66.67,  the  squares  of  the 
standard  deviations  of  the  tvro  individual  series,  includes  one  <t  2^ 

and  one  o"  2  ,  and  cr  2  ig  equal  to  o- 2    plus  0*2     plus  2  cr  ^  . 

u  x-^y^  x^^ 

Any  difference  remaining  after  cne-hrlf  the  suiruTiation  of  the 
squares  of  standard  deviations  of  the  tv/o  individual  series  (columns 
2  and  3)  is  subtracted  from  one-half  the  standard  deviation  squared 
of  the  X      y  series  (column  4)  is  attributable  to  the  fact  that  the 
latter  value  contains  one  (T  2    more  than  does  the  former  value. 

The  difference  between  these  tvro  measures  represents,  there- 
fore, the  m.agnitude  of  that  extra  0"  Its  derivation  is  one  of  the 
calculations  incident  to  the  solving  of  two  normal  equations..  In 
this  particular  instance  the  tvro  values  are  identical,   since  there  is 
no  correlated  item,  of  variability,  and  the  difference  betv\reen  them 
must  necessarily  be  zero.     It  is  apparent,  then,  that  the  total 
variability  is  accounted  for  by  the  uncor related  item. 


-4 


» 


-  17  - 


In  order  to  e^r.lur.te  the  effect  of  sec^son,   or  the  relation  thr.t 
season  bears  to  the  observed  variability,   it  is  desirr-ble  to  break 
down  the  uncorrelated  item  into  its  coiTxponent  parts.     Seasonal  changes 
and  their  effects  on  s trple-length  variability  rre  not  controlled,  cf 
course,  under  field  conditions  of  cotton  production,   and  it  is  for  this 
reason  that  there  is  need  of  a  measure  for  that  part  of  uncorrelated 
variability  that  is  in  addition  to  the  contribution  attributable  to 
season.     There  is  then  available  for  use  in  further  interpreting  the 
extent  of  hom.ogeneity,  as  well  as  heterogeneity  and  resulting  stratifi- 
cation,  seme  indication  of  the  relation  thrt  the  uncorrelated  vari- 
ability contributed  by  differences  "betirreen  the  scries"  bears  to  the 
uncorrelated  variability  contributed  by  differences  "v>rithin  the  series." 

There  mi^t  be  instances  in  the  analysis  of  certain  t;^rpcs  of 
data  in  which  the  correlated  item  will  account  for  the  total  calculated 
variability,   since  if  r  =  1,  then  c"  ^-^  ^  y  -  4  (T  ^«       The  relation  that 
this,  together  with  the  fact  thc.t  if  4  =  0,   C"       ^  ^  =  2  CT  ^,  bears  to 
analyses  by  vdiich  the  correlated  and  uncorrelated  parts  cf  variability 
are  separated  is  appreciated  v^ien  the  probability  is  realized  cf  both 
parts  of  variability  frequently,   if  not  ,p;enerally,   occurring  in  the 
samie  total.     It  vrould  likely  be  a  very  rare  exception  under  actual 
conditions  of  sampling,   especially  in  the  case  cf  biological  popula- 
tions,  if  an  instance  were  found  in  which  either  the  ccrrelcted  or  un- 
correlated part  of  variability  accounts  for  the  t-'-^'tal  variability  in 
observations  pertaining  to  successive  seasons.     The  following  exp.mple 
will  illustrate  perfect  correlation,  thus  showing  that  i-idien  r  =  1, 

=     4  <t2. 


-  18  - 


Tcible  2«-    Illustrative  data  in  Yrhich  the  vr.riance  of  the  column 
of  summations  equals  four  standard  deviations  squared 


1 

[  2 

!  3 

'.  4 

[  5 

Duplicate  observation 
number 

X 

X    T  V 

XV 

1  J 

12  ; 

:          12  1 

1  24 

s  ■  144 

2  1 

18 

i         18  1 

1          36  ! 

324 

3  ! 

t          25  1 

■ 

1         25  1 

1  50 

!  625 

4  ; 

!              30  ! 

t  1 
:          30  1 

!          60  1 

[  900 

5  ! 

40  1 

I  40 

!  80 

1600 

Total  1 
Mean  s 
Mean  of  squares  ! 
Square  of  mean  \ 

125  1 
\          25  ! 

718.6 
!        625.0  ; 

125  ! 
25  1 
:        718.6  1 
625c0  ! 

250 
50 
2874.4 
2500,0 

!  3593 

718,6 

Standard  deviation  : 
squared 

:          93.6  ! 

\          93.6  < 

374.4  1 

<r  ^  =  9.6747 


<T     -  9.6747 

y 


Product-moment  =  -    M^lv5y  =  93.6 

^  «    produ.ct-moment  *  •'^y^ 

  "     '        " '    ~  1,0. 

and  ^f^^^yZ4cr2  (i.e.,  cr  2^  4  (T       tim.es  2 


-  19  " 


As  in  the  first  illustration,   in  -which  the  coefficient  of  corrolf.- 
tion  is  equo.1  to  0,  r  is  cc.lculr.ted  by  dividin.;;  the  covcriance  b^r  the 

{.reometric  mean  of  the  variances.     The  covnriance,   corresponding  to  the 
product-moment,   is  of  the  same  magnitude  in  this  instance  as  the  jreomictric 
m.ean  of  the  variances,  which  is  the  equivalent  of  the  product  of  the  two 
standarc"  deviations.     Ivhen  this  agreement  in  miagnitude  occurs,   it  is 
obvious  that  the  coefficient  of  correlation  rr^ust  neccssr.rily  be  perfect. 

Determination  of  the  correlated  and  uncorrclated  parts  of  varia- 
bility is  accoFipli shed  by  the  procedure  and  equations  already?-  presented 
in  connection  v/ith  table  1,   in  which,   as  before,  the  s^/mbol  "c"  is  used 
to  designate  the  correlated  item,  and  the  svmbol  "u",  the  uncorrclated 
item..    ¥i[e  have,  therefore,  the  following: 

4  2  =  374.4  (or  o"  _^ 

2  0-^^   4-     o  \  =  187.2  (or  cT       ^      divided  by  2) 

0- -f'  cr  ^     =  93.6  (or  cne-hald  of  o"-^    -f-  o"  ^  ) 

^  U  ^  X   '  V'' 


0-       =  93.6  (or  187.2  -  93.6) 
0-        =    0      (or  93.6  -  93.6) 

These  calculations  arc  the  equivalent  of  the  follovring: 

2  2  2 

(1)  c"^^  -  y  y  -  187.2  -  93.6  =  93.6 

c  c 

(2)  o-       =  £-!zJ^l£l^'  _  cr       =  93.6  -  93.6  =  Oo 

2 


\^    The  covarianco  will  not  be  expected  to  be  lar'::er  than  the 
geometric  m.ean    of  the  variances,   since  the  coefficient  of  correlation 
calcalatod    by  the  ordinary  m.ethods  may  range  onl^^  from.  0  to  1. 


i 


-  20  - 


The  correlated  part  of  variability  is  93 .6,  and  the  uncorrelated 
part  is  0.     Four  times  the  correlated  part  of  variability  plus  two  times 
the  ^jncorrelated  part  equals  374.4,   as  stated  in  the  first  of  the  pre- 

ceding  equations      and  c   x      y*   column  4  of  table  2,   is  equal  to 

2  2  2 

o"   X  plus  o"       plus  2o""  ^. 

The  two  illustrations  represented  by  tables  1  and  2  have  indi- 
cated the  conditions  under  v^rhich  total  variability  v/ould  be  expected  to 
be  accounted  for  entirelv  bv  either  the  correlated  or  the  uncorrelated 
item.     In  actual  sampling;  problems,  these  conditions,   as  plready  indi- 
cated, would  seldom,   if  ever,  be  expected  to  occur,   especially  in 
biological  populations  pertaining  to  successive  seasons.    A  com.pre- 
hension  of  possibilities  becomes  m.ore  essential,  therefore,   in  order  to 
facilitate  the  interpretation  of  the  constituent  elem.onts  of  total 
variability,  to  which  these  correlated  and  uncorrelated  parts  contribute. 

It  i s  by  such  interpretation  of  variability  that  the  analyst 
concerned  with  the  samiplin?  of  stratified  universes  is  often  able  to 
form.ulate  lop:ical  conclusions  relative  to  the  homoe;eneity  of  the  popu- 
lation from  which  the  a.^o:regate  of  samples  was  dravm.     This,  together 
with  some  logical  appraisal  of  the  extent  and  importance  of  heterogeneity, 
constitutes  one  of  the  principal  problems  in  saiTipling  a  universe. 

The  Grade  and  Staple  Statistics  Section  of  the  Division  of 

Cotton  Ma^rketin?'  has  relied  completely  on  sample  data  in  preparing  the 

grade  and  staple  reports  on  cotton  scinned  in  the  United  States  during 

the  six  seasons  of  1928-29  to  1935-54.  j-j^/    It  has  been  important, 

19/  See  United  States  Department  of  Aririculture  Bulletin  40,  "Grade, 
Staple  Length,  and  Tenderability  '  of  Cotton  in  the  United  States,  1928-29 
to  1931-32";  and  periodic  grade"  and  staple  releases,   1928-29  to  1933-34. 


-  21  - 


therefore,  to  make  some  study  of  the  variability  in  ginnings  within 
districts  and  states  from  year  to  year  and  to  interpret  as  far  as 
practicable  the  causes  of  differences  in  proportionate  distribution 
of  ginnings  at  the  same  gins  rnd  in  the  same  localities  from  one  year 
to  another.    Variability  in  staple  length  of  ginn.ings  has  been  found  to 
be  so  great  in  many  instances  that  no  attempt  has  yet  been  made  with 
the  sample  now  being  procured  to  report  on  the  quality  of  cotton  ginned 
in  each  individual  county  or  on  the  quality  of  cotton  girjied  in  producing 
areas  smaller  than  the  districts  into  v/hich  the  states  are  subdivided.  2^ 

In  analyzing  the  variability  in  distribution  among  the  differ- 
ent staple  lengths  of  winnings  at  cooperating  gins  during  consecutive 
seasons,  the  mathematical  technique  has  been  designed  to  furnish  results 
that  indicate  the  homogeneity  and  heterogeneity  of  the  statistical 

20/    See  footnotes  11  and  14.     It  is  apparent,  of  course,  that  by 
sampling  every  bale  at  certain  gins,  as  the  cooperatinp"  ginners  agreed 
to  do,  a  larger  aggregate  of  samples  was  probably  obtained  in  some 
instances  than  was  absolutely  necessary  to  represent  a  comjnmity  or 
locality.     So  far,  however,   it  has  not  been  considered  advisable  to 
sam„ple  only  a  part  of  the  bales  ginned  at  cooperating  gins.     In  the 
first  place,   sampling  only  a  part  of  the  bales  mdght  result  in  the 
samples  being  dravm  r.t  im.proper  intervals,  and.  In  the  second  place, 
this  plan  would  result  in  information  being  available  on  the  quality 
of  the  cotton  belonging  to  som.e  patrons  of  a  gin  and  not  on  the  quality 
of  the  cotton  belonging  to  other  patrons.     It  will  be  realized,  however, 
that  if  such  a  plan  were  feasible  the  number  of  stratifications  could 
be  increased  without  any  change  in  the  a^'n'regate  of  samples.  'Jltim.- 
ately  it  nay  be  considered  desirable  b^-  the  Ccn.~ress  to  sample  and 
class  all  cotton  bales  ginned  at  all  gins. 

The  Association  of  Southern  Agricultural  Yiforkers  in  convention  at 
Jackson,  Mississippi,  February  5,   6,   and  7,   1930,   adopted  a  reso- 
lution asking  for  "adequate     appropriations  to  provide  a  larger 
statistical  sample  in  all  of  the  states."     (Proceedings,  31st 
annual  convention,  pp.  5 'and  6.) 


-  22  - 


universes  sampled*     The  followine:  illustration  represents  one  procedure 
that  may  be  useful  in  some  instances  in  the  analysis  and  interpre- 
tation involved  in  the  isolation  of  correlated  variability  22/  without 
the  deterTn.ination  of  the  magnitude  of  individual  parts  contributin?:  to 
the  uncorrelated  variability.     Gins  represented  in  table  3  are  a  part 
of  those  that  cooperated  with  the  Division  of  Cotton  Marketin,*?-  during 
the  seasons  specified  by  furnishin^r,  samples  to  be  classed  and  used  in 
preparin-T  the  firade,   staple- length,  and  tendcrability  reports. 


21/    In  this  paper  the  expression  "standard  deviation  squared"  is 
used  for  convenience  throughout  instead  of  "increm.ent  of  standard 
deviation  squared"  in  referring,  to  fractional  parts  of  variability 
contributing:  to  the  total.    Analysts  may  use  the  latter  term  if  it  is 
considered  preferable. 


41 


*  S3 


Table  3«  -  Variance  analysis  of  the  percentage  distribu- 
tion of  cotton  shorter  than  7/8  inch  G:inned  at 
six  Louisiana  gins  during  specified  successive 
seasons 


Gin 

!  Percenta;~;( 

»                  oil-JX  OOI 

3  distribution  of  cotton 
than  7/8  inch 

designation  ^ 

;  1928-29 
;  (x) 

'  1929-30 

•         X  4.  y 

A 

:  46.1 

!               OO  .  O 

B 

!  34»9 

;  43.9 

!  78.8 

C 

{  12.6 

!  15.2 

!  27.8 

D 

:  2.4 

!  2.9 

5.3 

E 

5            37.4  < 

71.8 

;  109.2 

F 

:            16.4  1 

20.6  ! 

37.0 

Total  J 
Mean  : 
Mean  of  squares  : 
Square  of  mean  : 

149.8  : 
24.97  : 
862.38  : 
623.50  ; 

187.7  J 
31.28  1 
1475.86  : 
978.44  ; 

337.5 
56.25 
4434.73 
3164.06 

Standard  deviation  squared 

239.08  : 

497.42  ! 

1270.67 

1/  Representing  gins  in  Lincoln,  Bienville,  Sabine,  DeSoto,  Union 
and  Claiborne  parishes. 

.1/  The  percentages  for  the  individual  seasons  represent  the  pro- 
portions that  cotton  shorter  than  7/8  inch  wc.s  of  the  total  ginned  at 
specified  gins. 


4  o-^^^^-  2  c-^ 


u 


1270.67  (or  o"  ^^^) 


2  o-^  -h 


u 


635.34  (or  o' ,       divided  by  2) 


o-  ^    4  .2 


U 


2  2 

368.25  (or  one-half  of  c-  x      ^"  v) 


o    ^  for  6 


267.09  (or  635.34  -  368.25) 
101.18  (or  368.25  -  267.09) 


o-^    ^  for  12 


_  2 


101.16  -f  9.92  (or  101.16  t  ^'  of 


« 


4  • 


-  24  - 


In  this  problem  the  calculations  have  been  facilitated  by  carry*- 
ing  decimals  to  only  two  placese     If  decimals  v/ere  carried  one  additional 
place  in  divisions  m.ade  subsequent  to  the  calculation  of  the  squares  of 
standard  deviations,  then  one-half  of  1270.67,  the  square  of  the  standard 
deviation  of  x  |  y,  would  be  expressed  as  635.335  instead  of  as  634.34, 
in  accordance  with  the  engineer's  rule,  which  operates  to  prevent  all 
the  inaccuracies  being  in  one  direction.     The  calculated  correlated  item 
would  be  267.085  if  decim.als  were  carried  this  additional  place,  and 
the  uncorrelated  item  would  be  101.165  instead  of  101.16.     In  this  in- 
stance,  four  times  the  correlated  item  plus  tv/o  times  the  uncorrelated 

item  would  be  equal  to  1270.67,   and  o"  would  be  equal  to  <T  ^ 

X  4-  y  X 

2 

plus  0"  y  plus  2  o~   Q.     The  carrying  of  decimals  in  such  instances  to 
more  than  two  places  solely  for  the  purpose  of  obtaining  a  greater 
degree  of  precision  in  the  arithmetic     calculations  may  not  be  warranted 
because  of  certain  characteristic  variability  in  the  basic  data. 

It  is  obvious  in  this  instance,  because  of  the  variability,  that 
there  is  stratification  within  the  series.     The  indications  of 
stratification  are  readily  observed,  of  course,   independent  of  the 
calculations,  because  of  the  v-ride  range  in  magnitude  of  observations. 
It  is  because  of  this  wide  ran.^re,  together  with  the  consequent  indi- 
cated certainty  of  stratification,  that  the  x  and  y  distributions  in 
table  3  are  analyzed  for  purposes  of  illustrating  the  technique  that  has 
been  applied  in  determining  the  extent  of  statistical  stratification 
represented  by  the  sam-ple  data  used  in  preparing  the  grade  and  staple- 
length  reports.     The  circumstances  which  permit  the  detection  of 


t  .■ 


-  25  - 


significant  strc.tif icction,   such  as  is  apparent  in  table  3,  v/ithout  the 
necessity  of  measuring  the  variability  arithnatically  should  help  to 
clarify  the  anal^dsical  concepts  of  the  method  used  in  studying  differ- 
ences in  staple- len>n:th  distribution  of  cotton  tinned  in  the  different 
states  and  districts. 

The  correlated  item,  of  267.09  is  the  equivalent  of  the  product- 
mom.ent,  which  can  be  readily  provcc'  by  subtracting  the  product  of  the 
two  m.eans,   24.97  and  31.28,   from  the  mean  product  of  paired  x  and  y 
observations.     The  product  of  the  two  m.eans  is  781.06,  and  the  m.ean  of 
the  products  of  the  prired  observations  is  1048. 15 •     Between  these  tvro 
values  there  is  a  difference  of  267.09,   corresponding  to  the  correlated 
item.    The  uncorrelated  item,  for  six  observations  is  accounted  for  by 
the  difference  between  267.09  and  368o25.     This  difference  is  101.16. 
For  twelve  observations  the  uncorrelated  item,  is  111.08,  the  sum:  of 
101.16  and  the  standard  deviation  squared  of  the  mioans  of  x  and  y. 
The  magnitude  of  this  squared  standard  deviation  is  influenced  by  any 
seasonal  changes  which  cause'   a  difference  between  the  two  m.eans. 

The  relative  m.agnitudes  of  correlated  and  uncorrelated  parts  of 
^'"ar lability  are  affected  by  differences  in  paired  observations  and  by 
differences  within  the  individual  series.     In  order  to  further  illustrate 
this  relationship  and  to  shov/  in  greater  detail  the  statistical  con- 
stituency of  variability  in  winnings  at  the  same  gins  durinc:  consecutive 
seasons,  the  values  derived  from,  the  data  in  table  3  have  been  recal- 
culated and  the  decimials  carried  a  greater  number  of  places.     This  il- 
lustration of  differences  betiveen  the  m^arnitudes  of  numierical  values 


;  5  v" 


-  26  - 


will  further  indicate  the  aTjplics.tion  of  this  method  of  analysis  in 
studying  the  relationship  between  fractional  parts  of  total  variability 
inherent  in  paired  series  of  percentages  representing  the  proportions 
that  cotton  of  any  specified  staple  length  was  of  total  "innings  at  gins 
represented. 


-  27  - 


Table  4.  -  Variance  analjz-sis  of  the  percentage  distribu- 
tion of  cotton  shorter  than  7/8  inch  ginned  at 
six  Louisiana  gins  during  specified  successive 

seasons  1/ 


Gin 


Percentage  distribution  of  cotton 
shorter  than  7/8-inch   


designation 

■  1928-29 
;  (x) 

'  1929-30 

;  (y) 

!       X  -f-  y 

A  i 

:       46.1  I 

!          33.3  1 

!  79.4 

B  1 

34.9  1 

!  43.9 

!  78.8 

C  ! 

12.6  i 

15.2  1 

27.8 

D 

:  2.4 

!  2.9 

!  5.3 

E  ! 

37.4  ! 

71.8 

;  109.2 

F  ! 

16.4  1 

20.6 

!  37.0 

Total 

Mean  j 
Mean  of  squares 
Square  of  mean 
Standard  deviation 
squared 

149.8  ! 

24.96667  ! 
862.57667  ! 
623.33461  ! 

187.7  ! 
31.28333  ; 
1475.85833  ! 
978.64674  - 

337.5 
56.25 

!       4434,,  72833 
!  3164'.06250 

1     239.24206  \ 

497.21159  \ 

1270.66583 

i/    Basic  data  taken  from  table  3.    Values  are  recalculated  by 
carrying  the  decimals  a  greater  num.ber  of  places  than  they  v/ere 
carried  in  table  3. 


4  o"^     4-2  0-^    =  1270.6656  (or  o"  ^      .  ) 

c  "         u  ^        ^  T  y 


2  o"   (3  -h      cr   ^  »  635.3329     (or  o"    ^  -(-  y' divided  by  2) 


cr 


u 


cr 


or  ^     for  6 

u 

cr^     for  12 

u 


368.2268     (or  one-half  of  O"  ^     4~  o"  ^  ) 

X    '  y^ 


267.1061  (or  635.3329  -  368.2268) 
101.1207  (or  368.2268  -  267.1061) 

101.1207  -/-  9.9750  (or  101.1207  -|-  cr  ^ 
of  means)  =  111.0957 


I 


-  28  - 


The  common  mcc.n  of  the  x  end  y  observations  is  28.125,  the 
quotient  obtained  by  dividin-'-  12  into  337.5,  the  totr  1  summr'tion.  The 
riean  of  the  squcres  of  devii-tions  of  individual  x  c.nc  y  observations 
froFi  the  common  m.er.n  is  S78.2018.     This  mec.n  of  squr.roc  devictions 
exceeds  368.2268,  which  includes  one  correlr.ted  pc.rt  e.nd  one  uncorrelrte 
pr.rt  of  vr.rirbility,  by  9.9750,  the  squr.re  of  the  stc.ncrrd  devir.tion  of 
the  two  m.er.ns,   24.96667  end  31.28333. 

It  mx.y  be  interesting  to  observe  clso  thc.t  the  difference  be- 
tween the  c.verr.^e  of  the  squ^.res  of  rll  the  x  r.nd  y  observations  rnd 
the  product  of  the  merns  of  x  r.nd  y  is  g;r eater  thr.n  366.2268,   or  the 
product-m.oment  plus  the  uncorrelcted  pert  of  variability,  by  an  amount 
equal  to  twice  the  magnitude  of  9.9750,  the  square  of  the  standard 
deviation  of  the  m.eans.     The  difference  between  the  averasre  of  the 
squcres  of  the  two  means  and  their  product  is  also  equcl  to  twice  the 
squrre  of  this  standard  deviation.     This  Ir.tter  difference,  that  is, 
the  difference  between  the  avera-'-e  of  the  squares  of  the  two  m.eans  and 
their  product,  exceeds  the  square  of  the  standrrd  deviation  of  the  x 
and  y  mxans  by  an  am.ount  equal  to  the  product  of  the  deviations  of  the 
tvro  individual  mieans  from,  the  comm.on  m.ean.  22_/    These  deviations  of 
individurl  means  from,  the  common  miean  are  necessarily  of  the  same  m.agni- 
tude. 

If  observations  in  the  y  series  were  of  the  srm.c  magnitude  as  the 

observations  in  the  x  series  with  which  they  ere  paired,  there  would  be 

no  uncorrelated  variability,  as  wr s  the  case  with  the  distributions  of 

2^2/    These  relctionships,  v/ell  knovm  to  mathem.aticians,  are  herein 
m.entioned  because  a  complete  comprehension  of  them,  might  assist  in 
interpreting  the  results  of  the  ancl^rtical  technique  applied  to  the 
study  of  variability  in  einnings  at  identical  2;ins  durinr  consecutive 

c  n  r  «J  n-r  q 


-  29  - 

observations  presented  in  tabic  2,   in  which  the  oorrolD.ted  item  ac- 
counts for  total  variability.     In  such  an  insta  nc  e  a  s  th at  r epr c  s ent - 
ed  by  paired  observations  in  trble  2  the  square  of  the  standard  devi- 
ation of  the  series  of  summations  of  paired  x  and  y  observations  is 
equal  to  four  times  the  square  of  the  standard  deviation  of  either  of 
the  individual  series,   or  exactlv  twice  as  p:roat  as  the  summation  of 
the  tY/o  squared  standard  deviations  calculrted  for  the  x  and  y  series* 

With  the  correlfted  and  uncorrclated  parts  of  variability 
isolated,  it  is  then  possible  to  proceed  with  the  separation  of  un- 
corr elated  variability  into  its  com-poncnt  parts.     The  isolation  of  the 
different  parts  of  variability  helps  to  make  it  possible  to  form.ulcte 
lop;ical  conclusions  as  to  the  am.ount  of  stratification,  v^hich  is 
directly  related  to  the  sam.pling:  procedure.    A  part  of  the  stratification 
may  be  attributable  to  changes  in  varieties  planted  from,  year  to  year. 
The  effect  of  chanp:es  in  varieties  on  variability  in  staple  length  of 
ginnin^.s  is  reflected  in  the  results  of  the  analyses. 

A  further  analysis  of  this  {general  tj'-pe  is  som.etimes  desirable 
in  determining  the  relative  im.portcncc  of  the  different  parts  of  vari- 
ability contributed  from,  the  different  detected  sources.     This  analysis 
is  easily  accomplished  by  arranrj.nc'  the  data  to  facilitate  r  two-?/p,y 
stratification*     It  is  then  possible  to  calculate  readily  the  measures 
representinT  each  part  of  variability  without  any  rearrangom.ent  of 
the  observations.     The  folloxving;  table  and  calculations  are  presented 
to  illustrate  the  technique  and  to  indicate  the  interpretation  tc  be 


■  i.  '  ■•  :  ■ 


-  30  - 


placod  upon  the  results. 

It  will  be  observed  that  the  fractional  parts  of  variability 
contributing  to  the  total  are  referred  to  as  standcrd  deviation 
squared,   just  as  they  have  been  in  the  analysis  of  data  in  tables  1, 
2,   3,  and  4»     If  considered  preferable,  which  it  ns.y  well  bo,  these 
fractional  parts  of  variability  can  be  referred  to  in  each  analysis 
as  increments  of  the  standard  deviation  squared.     (See  footnote  21.) 


■A 


« 


r 


Pi 

o 

O 

CO 

O 

Pi 

o 

w 

o 

cd 

Pi 

•H 

CO 

1 

CO 

> 

[> 

•H 

CO 

CO 

o 

(D 

o 

o 

f— ' 

•H 

•P 

p 

0 

•H 

•H 

H 

(•  , 

V-l 

+3 

•H 

01 

O 

•H 

0 

I'd 

Ph 

0] 

(D 

OU 

Cj 

Ph 

•H 

Pi 

(» 

^-^ 

CD 

P 

O 

c  . 
1— 1 

G) 

Dj 

,-^ 

PI 

•H 

0 

+^ 

cd 

a 

Cd 

w 

•H 

CO 

>5 

0 

iH 

•H 

-in 

cd 

o 

-ri 

cd 

0 

•H 

P^ 

Pi 

Cd 

•H 

> 

LO 

0 
H 

rd 


tJ  o 

U  -H  --d 

Cd  4^  0 

t:)  cd 

Pi  ^4 

cd  >' 

4-2  0  err* 

CO  •n  to 


in 

cd 


02  (-Xs  CN  LO  Oi 
CQ  r~\  Q  t>  i 
LO  ■^-■i*  CO       r>-  tO  ; 


O  rH 
^^  to 


O  COi 

W  co| 

t 


0 

u  Pi 

cd  <+H  cd 

:ri  o  0 

a"  ^ 

m 


O  W  H  O 

O  LO  {X2  O  CT^. 

■■X)  LO        t>  's^^. 


Oi  CO  (Tj  co! 


^  'JO  C-  LO  LO  O, 
fO  LO        C\}  O 
•>^^<  -vH  ■^i^  CO  LO  CO' 


cd  '•^^ 
0  o 


CO 
C) 
Pi 
cd 

a" 

00 


LO  <^0   CTi  CO  '-O 
-vTi  O-  C\3 
sji  O  LO  O'i  -xi^ 


LO 
CO 


CO  O  lO  w 

'■n  CO  O  ^  CD 

CO  «X)  CD  -shi 

^+1  ^-1  [TJ  ^ 


CO 

•  lO 

L' J  .  cO  II 
lO  CO 


rH 


rH  ilO 

oo 

••  .  CV! 


CO  , 

Pi 

cd, 

0 


Pi 
cd 

0 


O  LO  O  O  O  LO, 
LO  O   W   IN  r4  LOGV? 
to  O  CO        lO  rH 


O 


O  -^D  C-  O  CV2 
C>  CO  CO  CD  ^- 


Cm 

o ; 


"in 
O 

Pi 

o 

•H 

4-^ 

Id 


C\J  LO  cn. 

O  rH  ^4  ^1  CVJ  rH  CO  C\i  LO 

i>-  O  ^  crj  o;to  CO  o-  t-- 


O  CvJ  LO  O  LO  'c;* 
'vhi  to  DO  CVJ  '^fi  !>■ 

fH    rH    rH    rH    rH  CO 


vJH  LO 

to  LO  CO 

r-i    C\l  rH 

CO  CO 

rH  r~1 


rO  4^ 

•H  4-^. 

Pi  O, 

4-^  O, 
CO 

•H  ^; 

^  ■ 

(.D  -H. 

bD  I  \ 
cd  CO; 
-P 

p-i  !>, 

g  ; 

j 


LO 
I 

O^  'r4 

-1  ; 


O  O 

O  O  O  O  O  O  QO  CO  (T> 

O  '-^  cr>       CV!  ^t*  "=qi  CO  rN , 

^1  O   O   0   CO   C^-  rH  rH  Oi 

CO  CO  CD  LO  LO  O  CO  C^i  O 
to 

f:D  e-o 


00 
CV? 

(J) 


^1 


cv;  lo  CO 

O  rH        >vjH  03  H  CD  rH' 

!>-  CO  t>  O  CO        f  -j  UO  C"' . 

CO  rH  00  to  t-O  «D  to  Cr>  . 


I>  tJD  CO  CO  CJ  r-- 

;t-o 


03  ; 

to. 

lO  LOi 


col 

CO 

to; 


Cm 
O 


Cd 

4-^ 

o 

4^ 

o 

CV! 


b 


CD 

i.j 

to 


'--j-i 
Cj-> 

O 

SI 

I 
i 


5i 


CJ; 


I — I  j 

Pi 
o 

•H 

-P 

Cd 

•rH 

CO 
0 

'd 


p:;  CD  p:i 


oo 
0 

Cd 
d 

;  CO 
o 

H 

;  Cd  Pi  Pi 
.4^  Cd  cd 

O    0  0 

EH     :  -; 


Pi 

CO 

o 

CD 
Ph 

cd 
d 

a* 
m 


Pi  i 

0  1 

5  ; 

1  ■ 

0 

nd  ""d  J 

0 

U  cd 
cd  d 
'd 

Pi  ra 


0 

4-3 


rd 
4-^ 

O  ^ 
P 

Td 
cd  0 

a 

•H 

Cd 

b  ° 

O  0 

0  rH  X4 
^  P 

4-3  Ch 
O  ^ 

cd 


o 

C<2 


o 

0 
Ph 
cd 

d 

CO 


CO  r-4 
0 

cd  o 

rH  4^ 

P^ 

cd 

CO  rH 

0    Cd  M-4 

•H  fi  Q 
4^  O 

Pi  -H  CV! 

d  43  I 

O  -H  O 

o  ^d 

■r^  0 

Ph  cd  ^ 

d  4^ 
O  0  0 
,D  in  CO  ^4 
cd  4^ 


CO 

3 

H 
O 
o 

Pi 
cd 


Ph 

cd 
cq 


P4 

cd  0 
Pi 
-  o 


0 

o  d 


4-'' 
•H 
P' 

Cd 


Ph  ^d 
P  0 

0  -H  0 
,-lh  Ph 

Ph  Cd 

-  Cd  CO 

0  o 

rH  0 

Cd  0  ^ 

Ph  4^ 
0 

P  O 

o 

4^  P  0 
CO  O  ^ 

d  -H 

O  4-3  t:! 
K  cd  H 

rH 

-  d 

cd  o 

>•  rH 

0  Cd 

P  o 
0  d 

C  0 

P  4^ 

•r4 


d 
o 


0 


o 

CO 

p 
o 

•H 

4^ 

Cd 

•H 

> 

0 

Ph 
cd 
^ 
Pi 
Cd 

43  • 

CO  >j 

Q)  ^d 

rP  P 

p  Cd 
o 

o 


rH 
CO  rH 
P  Cd 
tH 

tiO  P 

•H 

P 
•H 

4-3 

P 

0  -H 

CO  O 
0  (D 
P  ^Cj 
Ph 

0  <H  P-' 
W  M  ^ 
O 


CO 
0 

Pt  CO 

Cd  p 
d  cd 
0  cj^  0 

O    CO  f~ 

o 
P 

Ph 


0  0 

4-3  +3 

CO 
•H 

4-^ 

CO  p  p 
H  >5  Cd  o 

-Q    0  -H 


o  o 


cd 


^d 
0 
P 

■H 

Cd 


rHiCOj  O 


<H-I 

o 


cd 

0  -H 

^  l> 
+3  (D 

tiD 

P  ^ 
•H  P 

+3  cd 
cd  ^d 

1  § 

d  +3 
CO  CO 


-  32  - 


?  2 
c  u 

2  (T  ^    i-  cr 

c  u 

cr  2    -/L  ,r 

c  u 


_  2 


0 


cr 


=  67.9660 

»  33.9830 
=  38.3705 


-  4.3875  (gin) 


-  2  . 

o      of  means 


Total 


42.7580  (season  H-  error  for  5  observations) 
35.4144 

78.1724  (season  -j-  error  for  10  observe tions ) 


9  c-^     -1-  3  cr^ 

c  u 


3  cr 


c 


0- 


2 


u 


2  2 
CT-       -j-  cr 


u 


885.3  600 
295.1200 
56.7932 


2  cr 


=  238.3268 


cr 


=  119.1634  (season) 

=  -62.3702  (p;in  -j-  error  for  2  seasons) 


o"     of  means      =  16.9914 


Total  -  45.3788  (gin  -|-  error  for  10  observations) 


-  45.3788 

-  4.3875 

-  40.9913 

78.1724 
•  119.1634 


-  40.9910 


gin  -/-  error  for  10  observations) 
gin) 

error  for  10  observations)  23/ 

season  error  for  10  observations) 
season) 

error  for  10  observations)  24/ 


23/  If  decimals  are  carried  a  sufficient  num.ber  of  places,  identical 
values  for  error  can  be  obtained  by  this  calculation  and  by  the  deter- 
mination of  the  difference  between  the  measures  of  variability  for  sea- 
son and  for  serison  and  error  combined.    A  check  on  each  of  these  series 
of  calculations  is  thus  provided. 


-  33  - 


By  multiplying  each  derived  pert  of  vcriability  by  the  number 
of  observations  and  then  dividing'  by  corresponding  dG?rrees  of  freedom, 
the  following:  is  obtained: 

Gin  =  -  4.3875  x  10,  or  -  43.8750,  4  =  -  10.9688 
Season  =  119.1634  x  10,  or  1191.6340,  r  1  =  1191.6340 
Error    =  -40.9913  x  10,   or     -409.9130,       9  =  -  45.5459 

Total      -     73.7846  x  10,   or  737.8460 

The  value  737.8460  is  the  equivalent  of  the  m.easure  of  total 
squared  variability  that  may  be  obtained  by  squaring  the  indi-t^idual 
X  and  y  observations,   summating,  and  then  subtractin,fi;  the  product  of 
the  sum  of  the  two  series  rnd  their  common  mean.     In  studies  involv- 
ing the  analysis  of  differences  in  paired  and  replicate  classifica- 
tions of  cotton  samples,  variability  has  been  analyzed  by  the  Analy- 
sis of  Variance  m.ethod  and  separated  into  component  parts  prepara- 
tory to  evaluating  the  degree  of  significance  of  differences  be- 
tween variances  and  preparatory  to  evaluating  the  relative  iriportance 
of  "bias"  and  of  that  part  of  "spread"  which  is  in  addition  to  "bias." 


With  com.ponent  pf:rts  of  variability  attributable  to  differences 


in  classing  thus  obtained,  the  error  is  readily  separated  from,  each 
part  and  proportionate  contributions  readily  determined.  25/  This 
procedure  for  analyzing  differences  in  classinp-  is  the  only  one  yet 
known  to  have  be  suggested  that  can  be  used  both  in  mieasuring  relative 
m.agnitudes  of  parts  of  variability  contributed  to  the  total  from,  dif- 
ferent detected  sources  and  in  interpreting  the  degree  of  significance 
of  differences  between  m.agnitudes  of  contributions  from,  different 


sources . 


r  .. 


-  33a  - 


25/  See  the  following; 

a.     "Variance  Analysis  of  Variability  in  Paired  and  Repli- 
cate Series  of  Staple-length  Observations  on  Cotton 
Samples,"  by  F«  li.  Harper,  W»  B»  Lanham,  And  0»  T.  Weaver 
(journal  of  Farm  Economics  -  July,  1934,  Vol*  XVI,  No*  3, 
pages  529-530,) 

b»     "Measurement  of  Average  Differences  between  Paired 
Observations  on  Staple  Lenf^th  of  Cotton  Samples," 
by  0.  T«  Weaver,  W«  B.  Lanham.,  and  F«  Ho  Harper  (journal 
of  Farm  Economics  -  July,  1934,  Vol*  P7I,  No«  3,  pages 
534-535). 

c»     "The  Analysis  of  Variance  Method  of  Measuring  Differences 
between  Staple-length  Designations  of  Press-Box  and 
Cut  Samples  of  Cotton,"  by  F.  H»  Harper  and  1/Y«  B»  Lanham 
(Mim.eo graphed  report  issued  by  the  Departmicnt  in 
■  October,  1933). 

d.     Numerous  office  reports  prepared  in  the  Grade  and  Staple 
Stati^stics  Section  of  the  Division  of  Cotton  Marketing 
on  classing  differences.     Copies  of  these  reports  are 
on  file  in  the  library  of  the  Division  of  Cotton 
Marketing. 

This  m.ethod  of  procedure  is  the  best  one  known  for  emphasizing 
the  difference  between  "bias"  and  "spread"  in  classing. 


iiCCordine-  to  the  calculations  follov/infr,  table  5,   4  o~       -^2  cr 

'  c  ' 

are  equal  to  67.9660,  the  square  of  the  sta,nc\arr'  r^eviation  of  the 

column  of  X  -|-  y  values.     The  calculated  c  o  r  r  G  X  a  ted  item  for  [lin  is 

-  4«3875,  vjhereas  the  calculated  uncorrclated  item  for  error  plus 

season  is  42«7580«     Four  times  the  correlated  item  is  -17.5  500  and 

two  times  the  uncorrelated  item  is  85. 5160*    The  total  of  these  two 

products  is  67.9660,  which  verifies  the  statement  that  this  quantity 

2  „  2 

represents  4  cT  2  o"        the  correlated  item  beino;,  as  observed, 

a  negative  quantity. 

The  results  of  the  second  part  of  the  calculations  by  which  the 

2      i        -  2 

tv\ro  parts  of  variability  are  separated  show  that  9  0^       -f-  3  o 

'  c  ^ 

equals  885.3600,  the  square  of  the  standarc'  deviation  of  the  two  x  and 

suiranations,   366*91  and  307.40.    The  calculated  correlated  item  for 

season  is  119.1634,  and  the  calculated  uncorrelated  item  for  rrin  and 

error  is  -62.3702.     Nine  times  the  correlated  item  is  1072.4.706. 

Three  times  the  uncorrelated  item  is  -187.1106.    The  algebraic  sum. 

of  the  two  products  is  885.3600,  representing:  9  o~^    -/-  3  cr  ^  . 

c    '  u 

After  the  number  of  stratifications  has  been  determined,   it  is 
possible  to  logically  allocate  a  /riven  aggregate  of  samples  among  the 
producing  states  according  to  the  ratio  that  the  figure  representing 
volum.e  or  probable  volume  of  ginnings  by  the  gin  or  ;i;ins  proposed  for 
each  statistical  stratification  bears  to  the  figure  representing  the 
total  summation  of  ginnings  by  all  gins  proposed  to  constitute  the 


-  35  - 


sr.r>.plG.     For  statements  pertaining  to  determination  of  stratifications 
and  to  the  allocation  of  sar.ples  see  the  following  office  reports, 
copies  of  vfhich  are  on  file  in  both  the  Grade  and  Staple  Statistics 
Section  and  in  the  library  of  the  Division  of  Cotton  Marketing. 
a»     "Analysis  of  Variability  in  Staple  Length  of  Cotton 
Ginned  During;  the  Seasons  of  1928-29,   1929-30,  and 
1930-31  by  Certain  C-ins  and  Size  of  SpJrple  Calculated 
for  1931-32,"  September,   1931,  pages  184-193. 

b.  "Analysis  of  Variability  in  Staple  Lenrth  of  Cotton 

Ginned  During  the  Seasons  of  1930-31  and  1931-32  by 
Certain  Gins  and  Size  of  Sample  Calculated  for  1932-33," 
September,   1932,  pages  50-57. 

c.  "Analysis  of  Variability  in  Staple  Length  of  Cotton 

Ginned  During  the  Seasons  of  1931-32  and  1932-33 
by  Certain  Gins  and  i  llocation  of  Calculated  Sar.ple 
for  1933-34,"  July  28,   1933,  pages  32-33. 

d.  "Procurem.ent  of  Sam.plcs  by  the  Grade  and  Staple  Section 

and  Suggested  xipportionment  of  the  Aggregate  of  Sam.ples 
for  the  1934-35  Season,"    April  28,   1934,   pages  48-51c. 
It  is  contem.plated  that  m.ore  com.prehensive  reports  will  be 
prepared  in  future  years. 


I 


